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SECTION  1 


REPORT  SYNOPSIS 


A.  INTRODUCTION 

We  report  here  the  results  of  a  study  on  implementing  a  low 
power  filter  using  state-of-the-art  CMOS  technology.  The  basic 
goal  is  to  design  a  1024  tap  filter  with  programmable  weights, 
that  has  linear  phase,  operates  at  a  sample  rate  of  8KHz,  and 
consumes  a  maximum  of  2.0  mA  at  3.6  V  (7.2  mW  power).  Input 
data  word  length  has  been  specified  as  between  8  and  12  bits. 

The  work  performed  here  is  based  in  part  on  an  earlier  study ^ 
on  a  low  power  filter  done  at  HRL  for  NOSC. 

In  Section  1 .B  we  present  a  summary  of  the  results  of  the 
study,  with  conclusions  in  Section  I.C.  Section  2  contains  the 
technical  supporting  details  of  the  report.  The  study  is 
divided  into  two  main  parts  —  Technology  Issues  (Section  2.B) 
and  Architectural  Issues  (Section  2.C).  Under  Technology  Issues 
we  will  review  the  latest  developments  in  CMOS  technology  and 
compare  the  performance  of  CMOS/SOS  and  CMOS /bulk  technologies. 
We  will  look  at  the  voltage  requirements  of  these  technologies 
to  see  if  3.6  V  is  an  acceptable  power  source  level.  In 
particular,  the  Hughes  VHSIC  CMOS /SOS  process  will  be  examined. 
In  the  Architectural  Issues  section  we  will  analyze  each 
component  of  the  low  power  filter  in  terms  of  power  consumption, 
speed,  and  gate  count  when  implemented  with  the  Hughes  CMOS/SOS 
process.  We  will  also  review  state-of-the-art  memory  chips  to 
see  if  one  could  be  suitably  used  for  the  low  power  filter. 
Lastly,  in  Section  2.E,  we  provide  a  preliminary  estimate  of  the 
cost  of  fabricating  the  low  power  filter  chip  using  Hughes 
CMOS /SOS  technology. 
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B.  SUMMARY  OF  RESULTS 

1 .  Technology 

Since  an  important  consideration  of  this  filter  chip  is 
extremely  low  power,  CMOS  technology  is  a  natural  candidate  for 
implementation.  CMOS  circuitry  dissipates  mainly  dynamic 
switching  power  (CV^f  power)  and  negligible  quiescent  power.  Of 
the  two  CMOS  technologies  to  choose  from  —  CMOS/SOS  and 
CMOS/bullc  —  CMOS/SOS  clearly  provides  better  speed/power 
performance  above  1.5  um  channel  length,  but  below  1.0  urn  there 
is  evidence  that  CMOS/bulk  performance  becomes  comparable  with 
CNOS/SOS.  As  channel  lengths  approach  1  un  and  below,  the  lower 
line>to-substrate  capacitance  advantage  held  by  CMQS/SOS  is  lost 
as  line“to-*line  interconnect  capacitance  becomes  significant. 
Progress  is  still  being  made  to  improve  the  speed/power 
performance  of  both  SOS  and  bulk  technologies,  and  it  is  unclear 
if  CMOS/SOS  would  still  be  significantly  better  than  CMOS/bulk 
at  submicron  feature  sizes. 

As  to  supply  voltage  requirements  for  state-of-the-art 
CMOS  devices,  we  have  reviewed  the  relevant  literature  and 
conclude  that  the  current  Hughes  VHSIC  CNOS/SOS  technology  will 
be  able  to  operate  from  a  3.6  V  power  source. 

We  have  also  investigated  using  CCD  technology  for  meeting 
the  data  storage  requirements  of  the  filter.  We  used  a  scheme 
that  partitions  the  data  for  storage  into  4  CCD  registers  and 
runs  each  register  at  8  MHz,  but  only  for  one-fourth  of  the 
time,  so  that  the  effective  operational  rate  is  2  MHz.  When  the 
1500b  bits  (b  is  the  word  size)  of  data  and  coefficient  storage 
is  implemented  using  current  CCD  technology,  we  calculated  that 
power  dissipation  would  be  3.6b  mW.  This  assumes  that  the 
lowest  acceptable  clocking  voltage  -  6V,  is  used.  Prom  this 
analysis,  we  see  that  both  power  consumption  and  required 
operating  voltage  v»ould  exceed  filter  specifications.  Hence,  we 
do  not  recommend  the  use  of  CCD  technology  for  data  storage. 
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2. 


Architectural  Issues  and  Power  Consumption 


Assuming  a  single  tap  implementation  for  the  low  power 
filter  operating  at  8  MHz,  the  major  components  of  the  filter, 
along  with  power  dissipation  and  device  count  for  each  component 
are  listed  in  Table  1 .  The  power  calculations  are  based  in  part 
on  a  low-power  Toshiba  256K  CMOS  static  RAM  chip  announced  at 
the  ISSCC  conference  in  February,  1984.  The  power  consumption 
for  this  chip  was  scaled  down  to  meet  the  data  and  coefficient 
storage  requirements  of  the  filter.  The  Hughes  VHSIC  CMOS/SOS 
process  parameters  were  used  to  calculate  power  dissipation  in 
the  processor  section.  CV^f  dynamic  power  is  assumed  to  be  the 
primary  source  of  power  dissipation  in  this  section.  C  is  the 
total  capacitance  of  each  component  in  the  processor  section,  V 
is  taken  as  3.6  V,  and  f  is  8  MHz.  In  Table  1  the  parameter  b 
is  the  word  size,  specified  as  between  8  and  12  bits.  The  total 
power  dissipation  for  the  filter  can  be  obtained  by  adding  the 
power  dissipated  within  each  component,  resulting  in 

Ptotal  ■  1*74b^  +  246b  18.3  uW. 

Similarly,  the  total  device  count  is  obtained  by  adding  the 
devices  for  each  component,  resulting  in 

Dtotal  *  9265b  +  340  . 

The  device  count  and  power  dissipation  broken  down  by  major 
components  for  a  10-bit  filter  is  shown  in  Table  2.  Data  and 
coefficient  storage  requirements  contribute  to  89%  of  the  total 
power  consumption  and  96%  of  device  count.  The  total  power 
consumption,  device  count  and  projected  chip  size  for  an  8 ,  10 
and  12-bit  filter  are  shown  in  Table  3.  The  projected  chip  size 
for  the  filter  is  obtained  by  estimating  the  area  occupied  by 
RAM  and  by  random  logic,  and  is  given  by 

S  -  51. 6b^  +  949b  +  548  mil^. 
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Table  1.  Device  Count  and  Power  for  Single  Tap  Filter 


Device  Count 

Power  ( uW ) 

Storage 

9200b 

241b 

Adder/Acc. 

42b  +  340 

2.26b  +  18.3 

Nultr. 

b(32b  +  19)  ^ 

1 .74b2  2.16b 

Output  Driver 

4b 

0.53b 

Total 

32b2  +  9265b  340 

1 .74b2  +  246b  +  18.3 

Table  2.  Device  Count  and  Power  Breakdown  for  10-bit  Filter 


Device  Count 
( %  of  total ) 

Power  ( uW) 

(%  of  total) 

Storage 

92  K  (96%) 

2.4  (89%) 

Adder/Acc. 

760  (0.8%) 

0.041  (1.5%) 

Multr. 

3.4  K  (3.5%) 

0.20  (7.3%) 

Output  Driver 

40  (0.04%) 

0.005  (0.19%) 

Total 

96.2  K  (100%) 

2.7  (100%) 

Table  3.  Total  Power  Consumption  and  Device  Count 
for  an  8,  10  and  12  bit  Filter 


b  bits/word 

Power  Consumption 
(  uW) 

Device 

Count 

Projected  Chip  Size 
(mil2)/(%  memory) 

8 

2.1 

76.5  R 

11.4  K  ( 59% ) 

)0 

2.7 

96.2  K 

15,2  K  (56%) 

'2 

3.2 

1  i6  R 

19.4  K  (52%) 
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There  is  a  trade-off  in  terras  of  power  cons jmpt ion  between 
a  single  tap  approach  and  a  multi-tap  approach.  A  multi-tap 
approach  operates  at  lower  speed  but  requires  more  gates  for 
implementation,  whereas  a  single  tap  approach  would  have  to 
operate  at  higher  speed,  but  requires  fewer  gates.  By 
analyzing  the  computational  requirements  of  a  multi-tap 
approach  we  find  that  the  power  consumption  as  a  function  of  the 
number  of  taps,  N,  is 

P(b,N)  »  243b/K  1.73b^  +  2.43b  14  uW. 

Here  we  see  that  only  the  power  associated  with  data  and  coeffi¬ 
cient  storage  (the  first  term  of  the  equation)  decreases  with  N. 
The  power  consumed  in  the  processor  section  (the  remainder  of 
the  terms)  remains  constant  because  even  though  the  processor 
section  is  operating  at  lower  speeds  as  S  increases,  the  number 
of  processors  operating  simultaneously  increases  with  N. 

The  total  power  consumption,  device  count  and  projected 
chip  size  for  a  10-bit  filter  is  shown  for  1  through  4  taps  in 
Table  4.  The  device  count  for  an  N-tap  filter  is  given  by 

D(b,N)  »  9229b  +  80  +  N(32b^  +  45b  +260)  , 

and  the  projected  chip  size  is 

S(b,N)  -  876b  +  129  +  N(51.6b^  +  72.6b  +  419)  mil^. 

The  first  term  in  D(b,N)  ?nd  S(b,N)  arises  primarily  from  the 
data  and  coefficient  storage  section,  and  the  brac)<eted  term  in 
both  equations  represents  contributions  from  the  processor 
sect  ion. 

When  using  a  multi-tap  approach,  chip  yield  must  be 
considered.  Prom  Table  4,  one  sees  that  going  from  a  single  to 
a  two-tap  implementation  reduces  the  power  consumption  by  46%, 
but  also  increases  chip  size  by  41%.  This  increase  in  chip  size 
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Table  4.  Power  Consumption,  Device  Count  and  Chip  Size 
for  a  10-bit  Filter  Implemented  with  N  Taps 


Number  of  tap, 
N 

Power 

(uW) 

Consumption 
(%  dec) 

Device 
*  (% 

Count 
inc.  ) 

Chip 
mil  ^ 

Size 
(%  inc. ) 

1 

2.6 

96c2  K 

15c2  K 

2 

1.4 

46 

100  K 

4.3 

21  .5  K 

41 

3 

1  .0 

62 

104  K 

8.4 

27.8  K 

83 

4 

0.82 

68 

108  R 

12.8 

34.1  K 

124 

may  translate  into  a  sizable  reduction  in  fabrication  yields. 

To  improve  chip  yields,  the  filter  implementation  may  be  parti¬ 
tioned  into  two  or  more  chips.  This  approach,  however,  would 
require  communications  between  chips  at  processor  speeds.  The 
power  consumed  by  chip  drivers  for  handling  inter-chip  communi¬ 
cations  would  in  all  likelihood  not  be  compensated  for  by 
operating  the  filter  at  lower  speeds. 

C.  CONCLUSION 

The  choice  of  a  CMOS  process  for  implementing  the  filter  is 
important.  The  two  components  of  the  filter  having  special 
processing  requirements  are  the  RAM  for  data  and  coefficient 
storage  and  the  A/D  converter  (if  it  is  placed  on  the  same 
chip).  Choice  of  a  technology  for  developing  a  low-power  inte¬ 
grated  RAM  design  is  important  because  the  RAM  portion  of  the 
filter  consumes  most  of  the  power  on  the  chip.  Also,  the  large 
size  of  the  RAM  makes  it  desirable  to  use  a  high  device  density 
technology.  Our  power  and  device  density  calculations  for  data 
storage  were  based  on  a  256K  static  RAM  from  Toshiba.  The  tech¬ 
nology  used  is  a  two-level  polysilicon,  two- level  metal,  p-well 
CMOS  process,  with  channel  lengths  comparable  to  the  Hughes 
VHSIC  S(-)S  process.  In  order  to  meet  the  low  power  specification 
for  the  filter,  a  similar  CMOS  process  may  have  to  be  used  to 
develop  a  low  power  RAM  for  data  storage  in  the  filter. 
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In  the  area  of  A/D  converter  designs,  significant  leakage 
currents  in  SOS  devices  can  make  it  difficult  to  achieve  milli¬ 
volt  precision  in  CMOS/SOS  A/D  converters.  However,  recent  work 
at  RCA  has  reduced  the  leakage  currents  of  SOS  devices  signifi¬ 
cantly.  Hence  it  may  now  be  possible  to  develop  high  precision 
A/D  converters  using  this  improved  CMOS/SOS  technology. 

We  have  also  examined  the  trade-offs  between  a  single  tap 
and  a  multi-tap  approach  for  the  filter.  A  single  tap  approach 
requires  a  higher  operational  speed  but  fewer  gates  to  imple¬ 
ment,  whereas  a  multi-tap  approach  requires  a  lower  operational 
speed  but  more  gates  to  implement.  A  multi-tap  approach  would 
result  in  power  savings  if  all  the  components  could  be  inte¬ 
grated  on  one  chip,  but  would  require  much  higher  power  if  two 
or  more  chips  were  needed  for  the  implementation.  The  yield  of 
fabricated  chips  becomes  am  important  consideration  when  chip 
size  is  increased  for  multi-tap  implementations.  We  estimate 
that  going  from  a  single  tap  to  two  taps  for  a  10-bit  filter 
would  increase  chip  size  by  41%,  but  also  decrease  power 
consumption  by  46%. 

Given  the  state-of-the-art  of  current  CMOS  technology,  the 
development  of  the  low  power  filter  chip  must  be  viewed  as  a 
research  effort.  The  low  power  requirement  for  the  filter  makes 
it  desirable  to  implement  the  filter  on  a  single  chip.  However, 
the  large  number  of  devices  necessary  for  implementing  the 
filter  makes  chip  yield  a  primary  concern.  Also,  to 
successfully  meet  the  low  power  requirements  of  the  filter,  it 
will  be  necessary  to  Integrate  a  high  density,  low  power  RAM 
technology  with  logic  circuitry.  This  RAM  technology,  requiring 
advanced  processing  techniques,  is  not  widely  available  at  this 
time . 
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SECTION  2 


TECHNICAL  ISSUES 


A.  INTRODUCTION 

In  the  preliminary  report  we  assumed  that  CMOS/SOS  would  be 
the  technology  for  implementing  the  low  power  filter.  In  this 
report  we  will  consider  the  trade-offs  between  CMOS/SOS  and 
CMOS/bulk  more  critically.  We  will  also  examine  the  operating 
voltage  requirements  of  CMOS  technologies  to  see  if  3.6  V  is  an 
acceptable  operating  voltage.  The  Hughes  VHSIC  CMOS/SOS  process 
is  a  good  example  of  what  is  possible  in  state-of-the-art  CMOS 
technologies,  so  we  will  present  the  performance  parameters  of 
this  process  as  being  representative  of  what  can  be  achieved  in 
CMOS /SOS  technologies  today.  We  will  also  analyze  power 
consumption  requirements  if  CCD  technology  is  used  for  data 
storage  on  the  filter  chip. 

Using  the  performance  parameters  for  the  Hughes  VHSIC 
CMOS/SOS  process,  we  will  obtain  speed,  power  consumption  and 
device  count  estimates  for  the  adder/accumulator,  multiplier  and 
output  drivers  on  the  filter  chip.  Hughes  does  not  have  a 
current  effort  to  develop  a  1.25  urn  static  RAM  chip,  so  to 
obtain  power  and  device  count  estimates  for  data  and  coefficient 
storage  on  the  filter  chip,  we  will  examine  CMOS  static  RAM 
chips  currently  available  in  the  market.  We  will  also  look  at 
the  trade-offs  between  using  a  single  tap  and  a  multi-tap 
implementation  for  the  filter  chip. 

B.  TECHNOLOGY  ISSUES 

In  the  following  sections  we  will  review  state-of-the-art 
CMOS  technologies  ^u^d  compare  CMOS/SOS  and  CMOS/bulk  in  terms  of 
power  consumption,  speed  and  device  technology.  We  will  also 
examine  operating  voltage  requirements  for  these  technologies. 
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and  interconnect  parasitic  capacitances  as  feature  sizes  shrink 
to  submicron  dimensions  to  see  how  significant  they  are  in 
comparison  to  gate  capacitances.  Specifically,  we  will  present 
the  performance  characteristics  for  the  Hughes  VHSIC  CMOS/SOS 
process.  Finally,  we  will  investigate  the  use  of  CCD  technology 
for  data  storage. 

1 .  CMOS  Technology 

In  the  preliminary  study  it  was  assumed  that  CMOS/SOS  would 
be  used  for  fabricating  the  low-power  filter.  It  is  generally 
acknowledged  that  CMOS/SOS  is  lower  power  and  faster  than 
CMOS/bulk  technology  at  channel  lengths  greater  than  about 
1.0  urn.  Below  1.0  urn,  however,  there  is  evidence  that  CMOS/bulk 
becomes  competitive  with  CMOS/SOS.  Traditionally  CMOS/SOS  is 
used  for  radiation-hard  applications,  whereas  CMOSA>ulk  is 
usually  used  where  chip  yield  and  cost  effectiveness  is  a 
factor.  In  choosing  a  technology  for  implementing  the  low  power 
filter,  some  consideration  should  be  given  to  developing  a  low- 
power,  high-density  staticRAM  and  an  A/D  converter  (vf  it  is  to 
be  on  the  same  chip)  using  that  technology.  These  are  two 
crucial  components  of  the  filter,  and  the  successful 
implementation  of  these  components  coulddepend  on  the  choice  of 
atechnology . 

As  is  well  known,  CMOS/SOS  technology  derives  its  superior¬ 
ity  from  the  fact  that  there  is  virtually  no  capacitance  from 
interconnects  to  the  non-conducting  sapphire  substrate 
(Figure  1).  This  low  capacitance  translates  into  higher 
switching  speeds  and  lower  dynamic  power  dissipation  -  two 
important  considerations  in  choosing  a  technology.  CMOS/bulk, 
on  the  other  hand,  does  exhibit  considerable  interconnect  capac¬ 
itance  to  the  substrate  (Figure  2).  First,  line-to-substrate 
capacitance  arises  from  metal  and  poly-silicon  wiring  over  field 
oxide.  This  capacitance  is  usually  about  an  order  of  magnitude 
less  than  gate  oxide  capacitance.  Second,  there  is  diffused 
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Figure  1.  Silicon-on-aapphlre  device  structure. 
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line  capacitance,  primarily  at  the  sidewall  adjoining  the  field 
oxide,  with  a  typical  value  of  about  4x10“**  pF/um.  The  presence 
of  greater  capacitance  in  CMOS/bulk  circuits  results  in  lower 
performance  of  bulk  circuits  at  device  dimensions  greater  than 
1.5  un.  This  is  evident  in  Figure  3  which  compares  the 
switching  speed  of  a  CNOS/SOS  and  a  CMOS/bulk  inverter  as  a 
function  of  minimum  feature  size.^ 

As  feature  sizes  are  decreased  to  less  than  1.5  urn,  the 
performance  of  CMOS/bulk  circuits  become  comparable  to  that  of 
CMOS/SOS  circuits.  First,  the  mobility  of  carriers  in  bulk 
devices  are  greater  than  those  of  SOS  devices.  Because  silicon 
grown  on  sapphire  contains  many  more  defects  than  bulk  silicon, 
the  scattering  of  carriers  reduces  their  mobilities  in  SOS 
devices  to  less  than  that  of  bulk  devices.  The  effect  of  this 
is  a  higher  drive  current  for  bulk  devices  than  SOS  devices  at 
equal  channel  lengths.  Second,  interconnect  line- to- line 
capacitance  becomes  significant  as  feature  sizes  decrease  below 
ij  {see  Section  2.B.).  Since  line-to-line  capacitance  is 
present  in  both  SOS  and  bulk  technologies,  the  advantage  of 
lower  line-to-substrate  capacitance  enjoyed  by  SOS  circuits  is 
lost  at  submicron  dimensions.  As  shown  in  Figure  3,  the 
switching  speed  of  a  CMOS/bulk  inverter  is  comparable  to  that  of 
an  equivalent  SOS  inverter  at  less  than  1.5  urn.  Moreover,  the 
study  at  Hewlett-Packard^  comparing  the  performance  of  ring 
oscillators  on  SOS  and  bulk  technologies  at  a  channel  length  of 
1.3  urn  concluded  that  both  speed  and  power  dissipation  were 
about  the  same. 

Improvements  are  still  being  made  in  both  CMOS/SOS  and 
CMOS/bulk  technologies.  New  crystal  growth  techniques  have 
reduced  the  defects  in  silicon  grown  on  sapphire,  thereby 
increasing  the  mobilities  of  SOS  devices.^  The  latchup  problem, 
particularly  severe  in  CMOS/bulk  circuits  as  feature  sizes 
approach  1  um,  is  also  being  solved.  Recent  studies  at 
Toshiba  indicate  that  CMOS/SOS  circuits  may  still  enjoy  a 
speed/power  advantage  over  CMOS/bulk  circuits  at  submicron 
dimensions.®  How  much  of  a  performance  advantage  CMOS/SOS 
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circuits  will  still  have  at  submicron  dimensions  is  unclear  at 
this  time.  The  Hughes  VHSIC  program  has  interest  in  both 
CMOS /SOS  and  CMOS/bul)c  technologies,  although  current  CMOS /SOS 
development  at  1.2  urn  channel  lengths  is  at  a  more  advanced 
stage. 

2.  Power  Supply  Voltage  for  CMOS  Filter  Chip 

One  of  the  specifications  for  the  low-power  filter  was  that 
it  operate  from  a  3.6V  power  supply.  Current  VHSIC  CMOS 
technology  is  targeted  at  5.0V  operation.  This  may  be  partly 
because  of  a  desire  to  maintain  voltage  compatibility  with  other 
digital  logic  families  (particularly  TTL) ,  most  of  which  operate 
at  S.OV.  In  this  section  we  will  consider  whether  it  is 
possible  to  operate  current  and  future  VHSIC  CMOS  technologies 
at  3.6V. 

Some  «forlc  has  been  done  within  the  Hughes  VHSIC  program  in 
studying  the  operating  voltage  range  of  CMOS  circuits  as  device 
dimensions  are  scaled  down.  The  result  of  this  work  is  shown  in 
Figure  4.  As  device  dimensions  are  scaled  down,  there  are 
several  phenomena  that  limit  the  operating  voltage  range  of  CMOS 
circuits.  Among  these  are  device  punchthrough,  oxide  breakdown, 
junction  breakdown,  device  turn-on  threshold,  and  excess  thermal 
generation.  From  the  VHSIC  study,  the  two  factors  that  limited 
MOSFET  operation  trhen  device  dimensions  were  optimally  scaled 
were  identified. 

At  the  high  end,  junction  breakdown  occured  when  drain-to- 
substrate  potentials  exceeded  the  breakdown  voltage.  This 
phenomenon  is  well  understood  as  avalanche  breakdown  in  reverse 
biased  pn  junctions.  When  electric  fields  within  the  junction 
reach  a  critical  value  (around  3  x  10^  V/cm) ,  carrier  impact 
ionization  will  cause  a  rapid  increase  in  current  flow  through 
the  junction.  Because  substrate  doping  tends  to  increase  as 
device  dimensions  are  scaled  do%#n  (for  threshold  compensation 
and  to  decrease  depletion  widths),  critical  breakdown  fields  are 
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reached  at  lower  drain  potentials.  Hence,  operating  voltages 
will  have  to  be  decreased  as  device  dimensions  are  scaled  down 
to  prevent  junction  breakdown. 

At  the  low  end,  device  operation  is  limited  by  the  turn-on 
voltage  of  devices.  This  parameter  can  be  controlled  to  some 
extent  by  changing  the  substrate  doping  oy  ion  implantation  and 
gate  oxide  thickness.  To  maintain  a  satisfactory  noise  margin, 
however,  it  is  desirable  to  use  an  operating  voltage  above  the 
tum-on  voltage  by  several  times  the  thermal  voltage,  kT/q. 

In  Figure  4  the  limits  of  operating  voltage  for  CMOS/SOS 
technology  is  plotted  against  the  gate  length  of  devices.  The 
shaded  area  represents  the  acceptable  operating  voltage  range 
for  circuits.  As  can  be  seen  from  the  figure,  current  VHSIC 
technology  at  1.25  urn  can  be  operated  safely  at  3.6V.  Moreover, 
we  believe  that  future  VHSIC  submicron  technologies  down  to 
0.5  um  feature  sizes  can  be  operated  at  3.6V  without  any  problem. 

3 .  Interconnect  Parasitic  Capacitances  as  a  Function 

of  Feature  size 


In  the  preliminary  study  on  the  low  power  filter,  it  was 
assumed  that  the  parasitic  capacitances  arising  from  intercon¬ 
nects  were  negligible  compared  to  gate  capacitances.  We  will 
examine  this  issue  in  greater  detail  in  this  section. 

In  a  study  performed  for  NOSC  in  1980  entitled  "Develop 
Submicron  Devices,"^  the  parasitic  capacitance  arising  from 
wiring  interconnects  as  device  feature  sizes  were  scaled  down 
was  considered.  In  this  study  it  was  assumed  that  all 
dimensions  scale  linearly  with  x,  the  field  oxide  thickness  was 
12.5  times  the  gate  oxide  thickness  (tg),  and  the  width  and 
spacing  of  wiring  was  1.5  times  the  gate  length,  Under 

these  conditions  the  gate  capacitance,  given  by. 


C 

q 


ox 


ch  ch 


ch ' 


decreases  directly  with  L 


Figure  5  shows  the  interconnect  parasitic  capacitances 

relative  to  gate  capacitances  as  device  dimensions  are  scaled 

down.  There  are  two  primary  sources  of  wiring  capacitances  - 

line-to-substrate  (C^)  and  line-to-line  capacitances  (C_,).  Let 

s  m 

us  first  consider  the  case  of  short  wiring  interconnects  within 

cells,  assuming  they  have  an  average  length  of  As  seen 

from  the  figure,  the  ratio  of  C  (8L  .  )/C^  would  be  constant, 

s  cn  g 

since  the  lengths  of  these  wires  would  scale  directly  with  gate 

lengths.  This  ratio  is  found  to  be  almost  unity.  Therefore, 

line*- to- substrate  capacitance  is  significant  in  CMOS /bulk 

technology.  This  capacitance,  however,  is  negligible  in 

CMOS /SOS  technology  because  there  is  no  conducting  substrate. 

From  the  figure,  the  ratio  C_(8L  for  short  wires  is  seen 

to  cn  g 

to  be  an  order  of  magnitude  less  than  1  for  gate  lengths  above 

1  Mm,  and  increase  to  exceed  1  below  l  m^.  Prom  this  we  can 

conclude  that  for  CMOS/SOS  technology  gate  capacitance  would  be 

dominant  at  a  VBSIC  gate  length  of  1.2  uRr  but  below  1  um  line- 

to-line  capacitance  becomes  significant. 

Now  let  us  consider  the  case  of  wiring  interconnects  with 

dimensions  comparable  to  chip  size,  L^.  These  long 

interconnects  are  evident  in  regular  chip  designs  such  as  memory 

chips  and  programmable  logic  arrays.  As  can  be  seen  in 

Figure  5,  long  wiring  capacitances  tend  to  dominate  vover  other 

capacitances.  In  the  line-to-substrate  case,  the  ratio 

C.(L  )/C_  is  about  100  at  5  um  gate  lengths  and  increases 
3  c  g 

rapidly  to  exceed  10**  at  subraicron  dimensions.  In  the  line-to- 

line  case,  C_(L_)/C_  is  about  1  at  5  u>b  snd  increases  to  be 
me  g 

comparable  to  C„(L  )/C  at  submicron  dimensions.  At  1.2  um 
s  c  g 

C..(L„)  is  about  10^  greater  than  and  C„(L  )  is  about  10^ 
sc  g  m  c 

greater  than  Cg.  The  contribution  to  total  chip  capacitance 
from  long  wires  will  be  significant  in  regularized  structures 
where  there  are  many  long  wires  spaced  closely  together,  such  as 
in  memory  chips.  In  random  logic  chips,  however,  the  collective 
contribution  from  gate  capacitances  usually  dominates  total  chip 
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capacitance.  Figure  5  shows  the  total  line-to-substrate 
(Cg{chip))  and  line-to-line  {Cnj(chip))  wiring  capacitance 
for  a  chip  dominated  by  short  wiring  interconnects. 

In  the  above  analysis  it  was  assumed  that  dll  feature  sizes 
scaled  linearly  with  x.  In  practice,  however,  this  would  not  be 
desirable;  technology  limitations  %#ould  prevent  the  scaling  down 
of  certain  feature  sizes  before  others.  For  example,  as  the 
gate  oxide  is  scaled  below  150  A,  brealcdown  mechanisms  begin  to 
occur,  causing  leakage  currents  across  the  oxide,  and  reducing 
the  reliability  of  devices.  Below  SO  A,  direct  quantum 
mechanical  tunneling  of  electrons  across  the  oxide  occurs. 

Hence,  for  NOS  devices  to  be  useful  at  submicrcn  dimensions,  the 
gate  and  field  oxide  thickness  would  have  to  be  scaled  down  less 
than  linearly.  Effectively,  gate  and  wiring  capacitances  would 
decrease  more  than  linearly  with  x.  As  wire  widths  are  scaled 
down,  however,  the  resistance  of  these  wires  increases 
(Figure  6).  At  submicron  dimensions,  the  resistance  of  these 
wires  would  be  great  enough  to  introduce  considerable  RC  time 
delay  in  the  propagation  of  signals.  Also,  as  wire  widths  are 
scaled  down,  electromigration  failure  becomes  more  prominent. 
These  two  phenomena  dictate  that  wire  widths  (and  spacing)  would 
have  to  be  scaled  down  less  than  linearly.  This  would  result  in 
greater  1 ine-to-substrate  capacitance  (in  CMOS/bulk  circuits) 
and  less  line-to-line  capacitance  than  if  scaling  were  done 
linearly. 

Another  source  of  interconnect  capacitance  in  CMOS/bulk  is 
diffusion  line  capacitance.  The  contribution  from  diffusion 
line  capacitance  tends  to  increase  as  feature  sizes  decrease. 
This  is  because  as  feature  sizes  decrease,  substrate  doping  is 
increased  for  threshold  compensation,  resulting  in  smaller 
depletion  regions  at  diffusion  junctions,  and  hence,  greater 
capacitance.  The  exact  relationship  b«^.weet>  diffusion 
capacitance  and  the  scaling  down  of  feature  sizes  is  left  for 
further  study. 
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Figure  6.  Interconnect  resistance  scaling. 
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This  section  has  summarized  the  results  of  a  study  on 
wiring  capacitances  as  feature  sizes  are  scaled  down  linearly. 
Although  feature  sizes  do  not  scale  linearly  in  practice,  we 
will  use  the  results  of  this  study  in  our  report  and  assume  that 
gate  capacitance  is  dominant  in  CMOS/SOS  circuits  at  the  VHSIC 
gate  length  of  1.2  urn. 

4 .  VHSIC  Technology 

The  current  emphasis  of  the  VHSIC  program  at  Hughes  is  on 
1.2  urn  CMOS/SOS  technology.  This  technology  has  been  demon¬ 
strated  with  fabrication  of  a  72,000  device  correlator  chip  for 
the  VHSIC  program.  Research  is  also  underway  to  develop  a  sub¬ 
micron  SOS  process  at  either  0.75  dh>  or  0.5  ^m  feature  sizec 
The  projections  are  that  a  submicron  technology  will  be  availa¬ 
ble  in  1987.  There  is  also  interest  at  Hughes  Newport  Beach  in 
CMOS/bulk  technology  where  a  3.0  urn  process  is  available  at  this 
time.  Also,  a  1.2  urn  CMOS/bulk  process  is  currently  being 
developed.  In  this  section  we  will  summarize  the  performance 
characteristics  of  current  VHSIC  1.2  um  technology  as  applied  to 
the  low-power  filter  design. 

Table  5  summarizes  the  feature  sizes  and  electrical 
parameters  of  the  Hughes  1.2  um  CMOS/SOS  process.**  The  minimum 
drawn  gate  length  is  1.4  urn,  resulting  in  a  channel  length  of 
1.2  um  after  lateral  diffusion  at  source  and  drain  are  taken 
into  account.  The  threshold  voltages  of  the  p  and  n-channel 
devices  are  nearly  identical  at  1.2V.  With  gate  oxide  thickness 
at  400  A,  the  gate  oxide  capacitance  is  8.6  x  10“**  pF/u^.  From 
Section  2.B.3,  we  can  assume  that  gate  capacitance  will  be  the 
primary  source  of  power  dissipation  in  CMOS/SOS  circuits.  For  a 
minimum  geometry  device  of  2  um  x  1.2  um,  the  capacitance  per 
gate  is 

Cg  ■  8.6  X  10  ^  (pF/u^)  2  X  1.2  (u^)  ■  2.1  x  10  ^  pF/gate. 
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Table  5.  Hughes  VHSIC  CMOS/SOS  Process  Parameters 


Minimum  Dimensions  (u) 


Transistor  Length  1.4 
Transistor  Width  2.0 
Metal  Width  2.4 
Metal  Spacing  2.6 
Polysilicon  Width  1.4 


Polysilicon  Spacing  2.2 
Nominal  Thickness  (A) 


(Leff  = 


Silicon  5000 
Poly  Silicide  5500 
Field  Oxide  5000 
Metal  7500 
Gate  Oxide  400 


! 


Electrical  Parauneters 

Contact  Resistances  (ohms)  for  2u  x 


Max 

Typical 

Metal /n"*" 

100 

50 

Metal/P'*’ 

30 

10 

Metal/Poly 

5 

2 

n"*" /Metal /P'*' 

250 

100 

Sheet  Resistances 

Si 

100 

40-60 

P"^  Si 

200 

100-115 

Poly  Silicide 

5 

2.5-4. 5 

Metal 

0.05 

0.04 

Electrical  Parameters 

TN 

'OX 

^ch,n 


1.2  V. 

8.6  X  lO""^  pF/y^ 

300  cm^/V-sec 
62  umhos 
16  Kohms/s?:. 


9p 

^ch,P 


u  contact  area 


X  4u  contact  area) 


-1.2  V. 

170  cm^/V-sec 
35  umhos 
28.6  Kohms/sq. 


The  corresponding  dynamic  power  dissipation  for  a  minimum  geome¬ 
try  device  operating  with  a  supply  voltage  of  3.6V  at  SMHz  is 

-3  26 

Pg  -  2.1  X  10  (pP/gate)  (3.6V)  8x10  (MHz)  50%  /  2 

*  0.054  uW/gate. 

The  50%  factor  arises  from  assuming  that  the  device  changes 
state  every  other  cycle.  We  will  use  these  values  for  and 
in  Section  2.C  for  calculating  the  power  consumption  of 
components  for  the  low  power  filter. 

Prom  Table  5,  the  channel  resistance  in  the  linear  region 
is  16  KQ/sq  for  an  n-channel  device  and  28.6  Ku/sq  for  a  p-chan- 
nel  device.  These  values  are  much  greater  than  the  interconnect 
and  contact  resistances  shown  in  the  same  table.  Therefore,  in 
estimating  the  speed  of  circuit  components  for  the  filter  in 
Section  2.C,  we  will  assume  that  channel  resistance  and  gate 
capacitance  contribute  the  moat  to  propagation  delay  in  the 
circuits. 

5 .  CCDs  for  Data  Storage 

In  the  preliminary  study  of  the  low-power  filter  chip,  it 
was  discovered  that  the  memory  for  storage  of  data  and  filter 
coefficients  (about  1.5b  Kbits  total)  consumed  a  significant 
amount  of  power.  In  this  section  we  will  consider  the  use  of 
CCD  shift  registers  to  see  if  power  for  data  storage  can  be 
minimized . 

Since  most  of  the  power  dissipated  in  a  CCD  shift  register 
is  CV^f  dynamic  power,  we  will  try  to  minimize  this  power  by 
assuming  a  design  based  on  four  256-word  shift  registers 
connected  as  shown  in  Figure  7.  Every  1/(8KHz)  second  the  B 
switches  connect  the  four  shift  registers  into  one  long  1024- 
word  shift  register,  and  a  new  datum  is  inserted  at  INPUT  into 
the  shift  register.  The  B  switches  are  then  flipped  the  either 
way  so  that  data  within  each  of  the  four  shift  registers  can 
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Figure  7.  cCD  memory  partitioned  into  four  segments. 
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circulate  within  themselves.  However,  only  one  of  the 
256-word  shift  register  is  circulating  at  one  time,  and  as  it 
does  so,  data  is  fed  through  switch  A  at  8MHz  to  the  multiplier 
for  correlation.  The  shift  rate  for  each  shift  register  is 
8MHz,  although  each  shift  register  stops  for 

a^^MHz  *  ^  96  us 

while  data  is  being  fed  from  the  other  three  shift  registers. 
Effectively,  each  shift  register  is  working  at  2  MHz.  The  shift 
register  for  coefficient  storage  would  be  treated  in  a  similar 
way. 

Table  6  lists  the  parameters  for  CCD  technology  developed 
at  Hughes.  Using  a  minimum  line  width  of  2.5  and  a  cell  size 

of  ')Q  um  X  10  mr  results  in  an  array  area  of  i.5  x  10  ^b  um*^  for 
1500b  bits  of  data  storage.  This  translates  to  50b  pP  of 
electrode  capacitance.  The  minimum  operating  voltage  of  these 
CCD  circuits  is  6V.  Hence,  the  power  required  to  operate  this 
memory  array  is 

P  -  50b  pP  .  (6.0V) 2  .  2MHz  «  3.6b  mW. 

Por  a  minimum  word  size  of  8  bits,  the  power  consumed  would  be 
approximately  29  raw,  exceeding  the  low  power  filter  requirement. 
Also,  the  need  to  run  the  CCD  array  at  a  minimum  of  6V  tends  to 
rule  out  the  use  of  CCD  technology  for  the  low  power  filter. 

C.  ARCHITECTURAL  ISSUES 

In  the  preliminary  report  we  advocated  a  single  tap 
approach  for  the  low  power  filter  (Pigure  8).  In  the  following 
sections  we  will  estimate  the  operational  speed,  power 
dissipation,  device  count,  and  chip  size  of  a  single  tap  filter 
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Table  6.  Hughes  CCD  Technology 


Minimum  Line  Width 

2.5  u 

Cell  Size 

10  It  X  10  \i 

Oxide  Thickness 

1000  A 

Total  Array  Area  (1500b  bits) 

1.5  X  105  b 

Total  Capacitance 

50  b  pP 

Power  Consumption  (#  6V,  2  MHz) 

3.6  b  mW 

t024 

y<nl  -  X  *  In  -  fn> 

m  •  0 


Figure  8.  Single-tap  implementation 
of  low  power  filter. 
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when  implemented  using  current  Hughes  VHSIC  CNOS/SOS  technology. 
We  will  use  the  technology  parameters  presented  ir  Section  2.B.4 
of  this  report.  In  Section  2.C.6  we  will  examine  the  trade-offs 
of  a  multi-tap  implementation  for  the  filter. 

In  estimating  the  power  dissipation  of  the  filter  compo¬ 
nents,  %»e  will  make  two  simplifying  assumptions.  First,  we  will 
assume  that  parasitic  capacitances  for  CMOS/SOS  technology, 
including  line-to-line  interconnect  capacitances  and  intercon¬ 
nect  crossover  capacitance,  is  small  compared  with  gate  capaci¬ 
tances.  Proa  the  data  in  Figure  5,  this  is  a  reasonable 
assumption.  In  this  figure  we  see  that  line-to-line  interconnect 
capacitance  at  1.2  um  is  an  order  of  magnitude  less  than  gate 
capacitance.  Second,  we  will  assume  that  in  the  case  of  RAMs, 
power  consumption  is  proportional  to  the  size  (number  of  bits 
and  chip  area)  of  the  RAM.  This  is  not  strictly  true,  since  in 
CMOS  RAMS  a  considerable  amount  of  power  is  dissipated  in 
drivi.ng  the  capacitances  of  long  data  and  address  lines.  These 
capacitances  do  not  scale  linearly  with  RAM  size.  However,  we 
will  scale  down  power  consumption  for  state-of-the-art  CMOS 
static  RAM  chips  to  obtain  first  order  estimates  of  power  dissi¬ 
pation  for  data  and  coefficient  storage  in  the  filter.  Making 
these  two  assumptions  for  power  dissipation  will  enable  us  to 
estimate  total  power  consumption  for  the  filter  without  laying 
out  the  components  first. 

In  obtaining  power  consumption  estimates  for  the  filter, 
we  will  assume  that  cell  designs  utilize  minimum  geometry 
devices.  Cells  in  the  Hughes  VHSIC  library  are  usually  designed 
to  drive  large  capacitive  loads,  and  hence  dissipate  more  power 
and  operate  at  higher  speeds.  In  estimating  the  speed  and  power 
for  a  filter  design  using  minimum  geometry  devices,  the  VHSIC 
values  for  power  dissipation  and  operational  speeds  will  be 
scaled  down  accordingly.  Also,  in  calculating  the  power  dissi¬ 
pation  of  cells  in  the  filter,  we  wii i  assume  that  only  half  the 
electrical  modes  in  the  cell  chaf)ge  state  dut^nq  any  c .  ork 
cycle.  This  provides  a  conservative  estimate  for  power  dissi¬ 
pation,  since  probably  fewer  than  50%  of  the  modes  change  state 
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every  cycle.  For  the  purpose  of  estimating  power  dissipation 
for  the  filter,  however,  we  will  use  the  conservtive  value 
calculated  using  1/2  CV^f  (50%). 

1 .  Adder/Accumulator  Configuration 

In  this  section  we  will  present  the  basic  adder  and  accumu¬ 
lator  configurations  to  be  used  in  the  low-power  filter.  The 
adder  is  a  crucial  element  used  repeatedly  in  the  accumulator 
and  multiplier  sections  of  the  filter.  We  will  obtain  estimates 
for  the  power,  speed,  and  silicon  area  of  the  adder  and  accumu¬ 
lator  in  terms  of  current  VHSIC  1.2  urn  CMOS/SOS  technology. 

The  full  adder  cell  being  used  in  the  Hughes  VHSIC  program 
is  shown  in  Figure  9.^  There  are  a  total  of  26  devices  in  the 
cell,  with  channel  widths  ranging  from  5  um  to  17  ^m.  For  the 
low  power  filter,  however,  we  will  assume  a  minimum  geometry 
device  design  with  channel  width  of  2.0  um.  Using  the  para¬ 
meters  in  Table  5  for  the  VHSIC  1.2  uo  CMOS/SOS  process,  the 
total  gate  capacitance  was  calculated  to  be 

C  -  8.6  X  10“-  (pF/u2)  X  2u  X  1.2u  x  26 
-  0.054pP. 

Total  power  dissipation  for  the  cell  is  given  by 
1  2 

Power  •  2  f.(50%  duty  cycle),  C»0.054pF,  V-3.6V,  f«8MHz 

-  1.4uW. 

The  50%  duty  cycle  arises  from  assuming  that  only  half  the  nodes 
in  the  cell  change  state  every  cycle. 

The  performance  of  this  cell  has  been  simulated  as  part  of 
the  VHSIC  effort.  From  the  simulations  it  was  found  that  the 
propagation  delay  from  input  to  the  SUM  output  was  approximately 
8.5ns.  Propagation  delay  from  Cj^^  to  for  a  ceil  was  3.6ns. 

These  propagation  delay  times  would  be  longer  if  minimum 
geometry  devices  were  used  in  the  cell. 


Accumulator 


The  output  of  the  multiplier  has  to  be  summed  for  1024 
cycles  to  obtain  one  convolution  point.  To  perform  this  summa¬ 
tion,  an  accumulator  with  b-tiO  adders  is  provided,  as  show  in 
Figure  10.  The  outputs  of  these  adders  are  captured  in  shift 
registers  and  fed  back  to  the  adders  every  clock  cycle  to  be 
added  to  the  next  output  from  ths  multiplier.  The  maximum  carry 
propagation  through  the  accumulator  using  VHSIC  device  geome¬ 
tries  for  b«12  bits  is 

T^  •  22  X  3.6ns  •  79.2ns, 

much  less  than  the  cycle  time  for  the  filter.  This  propagation 
delay  would  increase  if  minimum  geometry  devices  are  used  to 
implement  the  adders.  If  this  delay  becomes  longer  than  the 
filter  cycle  time,  carry- lookahead  techniques  may  be  used  to 
decrease  the  delay.  Table  7  lists  the  number  of  devices  in  the 
accumulator  and  the  power  dissipation  when  minimum  geometry 
devices  are  used.  The  total  power  dissipated  by  the 
adder/accumulator  section  is 

P  -  2.26b  -*■  18.3  uW, 
acc 

and  the  total  device  count  is 

D  -  42b  ♦  340 
acc 

2 .  Multiplier  Configuration 

In  this  section  we  will  consider  the  multiplier  for  the 
low-power  filter  in  terms  of  the  Hughes  VHSIC  1.2  u"  technology. 
There  are  several  possible  multiplier  configurations  tree, 
array,  RDM-based:,  bi..t  the  configuration  based  on  a  modified 
Booth's  algorithm  seems  to  be  best  from  the  standpoint  of  power 
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Pigur*  10.  Accumulator  for  adding  1024  b-bit  words. 


Tabl«  7.  D«vic«  Count  and  Pow«r  Dissipation  for  Accumulator 


■ 

Devices 

Total  * 

Cell  Type 

■1 

Cells 

per  cell 

Devices 

Power  ( uW) 

Adder 

b 

♦  10 

26 

26  (b  ♦  10) 

1.4  (b  +  10) 

SR 

2 

(b  ♦  5) 

8 

16  (b  ♦  Si 

0  86  (b  ♦  5) 
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dissipation,  device  count,  speed,  and  layout  geometry.  This 
algorithm  uses  a  radix  4  method  to  examine  the  multiplier  word  3 
overlapping  bits  at  a  time.  Partial  products  are  accumulated  in 
half  the  number  of  steps  as  necessary  in  other  schemes. 

Moreover,  negative  numbers  are  handled  as  well,  in  2's 
complement  form.  Table  8  shows  the  method  for  accumulating 
partial  products  based  on  examining  3  bits  of  the  multiplier. 

The  Booth  multiplier,  implemented  in  pipeline  fashion,  is 
sho*m  in  Figure  11.  The  incoming  multiplier  words  are  stored  in 
shift  registers  at  the  right,  and  decoded  3  overlapping  bits  at 
a  time  by  the  Booth  Decoders  (BD's).  Depending  on  the  value  of 
the  3  bits,  the  control  lines  to  the  Select  circuits  are 
activated  to  add  either  0,  X,  2X,  -X  or  -2X  to  the  partial 
product.  The  actual  addition  is  performed  in  ripple  carry  form 
using  the  full  adder  described  in  the  previous  section.  Note 
that  only  (b/2-1)  rows  of  adders  are  needed  to  accumulate  the 
partial  products. 

The  worst  case  propagation  delay  through  one  stage  of  the 
pipeline  is  the  sum  of  set-up  time  for  the  Booth  decoders, 
select  circuits,  and  carry  propagation  through  the  adders: 


stage 


bd 


+  T 


sel 


♦  12*T 


carry 


This  propagation  delay  through  one  stage  would  be  63ns  if  VHSIC 
geometry  devices  were  used  in  the  design,  but  would  be  longer  if 
minimum  geometry  devices  were  used.  We  do  not  anticipate  that 
one  pipeline  stage  delay  will  exceed  the  filter  cycle  time  even 
if  minimum  geometry  devices  are  used  in  the  design. 

Table  9  lists  the  number  of  cells  and  devices  used  in  the 
pipelined  Booth  multiplier.  The  power  dissipation  shown  for 
each  cell  type  is  based  on  CV^f(50%)/2  dynamic  power,  where  C  is 
the  total  gate  capacitance  in  the  cell,  V  is  3.6V,  and  f  is 
8NHz.  The  50%  factor  is  included  assuming  only  half  the  nodes 
in  the  cell  change  state  each  cycle.  The  total  power  consumed 
by  the  multiplier  is 
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Table  8.  Modified  Booth's  Algorithm  for 
Accumulating  Partial  Products 


^i+l  ^i-l 

Add  to  Partial  Product 

0  0  0 

0 

0  0  1 

X 

0  10 

X 

oil 

2X 

10  0 

-2X 

10  1 

-X 

110 

-X 

111 

0 

X  -  Multiplicand 


Table  9.  Device  Count  amd  Power  Consumption  for 
Components  of  Pipelined  Booth  Multiplier 


Devices 
Per  Cell 

«  Cells 

Total  # 
of  Devices 

Power 

(uW) 

Shift  Register 

8 

|(|  b-1) 

4b(|  b-1) 

0.22b(|  b-1) 

Select 

18 

b^/2 

9b^ 

0.48b^ 

Booth ' 3 

98 

b/2 

4  9b 

3.8b 

Full  Adder 

26 

b(|  -  1) 

26b(|  -  1) 

1.4b(|  -  1) 

34 


Figure  11.  Pipelined  booth  multiplier. 
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P  -  b(1  .74b  2.16)  uW  , 

mult 

and  the  total  device  count  for  the  multiplier  is 

°mult  -  ^  19). 

3.  Data  and  Coefficient  Storage 

In  the  preliminary  study  of  the  lovr-power  filter  it  was 
estimated  that  the  data  storage  section  would  consume  the  most 
power  on  chip.  This  estimate  was  based  on  a  Hughes  16K  static 
RAM  fabricated  using  2.5  urn  CMOS/SOS  technology  about  3  years 
ago.  Por  a  more  accurate  estimate  of  speed  and  power 
dissipation  of  current  CMOS  memory  chips »  we  surveyed  the  papers 
presented  at  the  most  recent  XSSCC  conference  (held  February 
1984).  These  papers  are  representative  of  what  can  be  achieved 
in  memory  design  today. 

Table  10  shows  in  summarised  form  the  characteristics  of 
the  low  power  RAMs  presented  at  the  ISSCC  conference. ®  Most  of 
these  RAMS  use  CMOS  technology,  and  the  effective  gate  lengths 
are  comparable  to  current  Hughes  VHSIC  gate  lengths  (1.2  pm). 

The  access  times  of  the  static  RAMs  are  in  general  better  than 
those  for  dynamic  RAMs  because  charge  sensing  and  refreshing  of 
dynamic  memory  cells  require  a  longer  cycle  timec  The  access 
times  of  the  static  RAMs  would  meet  the  requirements  of  the  low 
power  filter  easily,  but  the  longer  cycle  times  of  the  dynamic 
RAMs  would  be  a  problem. 

The  lowest  power  RAM  in  Table  10  is  a  256K  CMOS  static  RAM 
developed  by  Toshiba  (Appendix  A).  The  gate  lengths  are  1.2 
for  n-channel  devices  and  1.5  urn  for  p-channel  devices  (current 
Hughes  VHSIC  technology  uses  1.2  u"  for  both  n  and  p-channel 
devices).  Access  time  for  this  chip  is  46ns,  much  less  than  the 
125ns  required  in  the  low  power  filter.  Active  power  dissipa¬ 
tion  measured  at  iMHz  is  lOmW,  and  standby  power  is  0.03mW.  If 
we  scale  the  active  power  according  to  the  requirements  for  the 


36 


Table  10.  Performance  Characteristics  of  Current 
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low  power  filter  (data  storage  »  1.5b  Kbits,  voltage  supply  = 
3.6V,  speed  ■  8Mhz),  we  obtain 

P  »  ^^1^  •  (5^)  •  X*  "  0.243b  mW. 

From  this  calculation,  we  can  see  that  a  static  RAM  with 
1500b  bits  of  storage  could  be  designed  using  state-of-the-art 
CMOS  technology  that  would  meet  the  specifications  of  the  low 
power  filter.  Such  a  RAM  would  require  approximately  9000b 
devices  to  implement,  assuming  a  design  utilizing  6 
devices/cell . 

We  should  provide  a  note  of  caution  here.  The  technology 
used  for  fabricating  the  low  power  Toshiba  RAM  chip  is  a  two- 
level  polysilicon,  two-level  metal,  p-well  CMOS  process.  Bach 
memory  cell  utilizes  4  transistors  and  2  polysilicon  resistor 
loads.  These  polysilicon  resistors  require  tight  processing 
tolerances  so  that  the  resistances  would  be  relatively 
temperature  invariant,  and  uniform  resistances  are  maintained 
across  the  wafer.  This  requires  fairly  advanced  processing 
techniques  that  may  not  be  widely  available  yet.  Without  a  CMOS 
process  similar  to  this,  it  may  be  difficult  to  develop  a  RAM 
that  would  meet  the  low  power  specifications  of  the  filter. 

In  comparison,  the  next  lowest  power  consuming  RAM 
presented  at  the  ISSCC  conference  was  a  64K  CMOS  static  RAM  from 
Hitachi  (Appendix  B).  The  gate  length  for  both  n-  and  p-channel 
devices  in  this  chip  was  1.3  um.  The  power  consumption  measured 
for  this  chip  at  8MHz  was  about  150mW  (Figure  5,  Appendix  B). 

If  we  scale  this  down  for  the  low  power  filter's  requirements, 
we  obtain 


■  1.8b  mW. 
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Obviously,  this  would  exceed  the  filter  specification  for  power 
consumption.  We  stress  here  again  that  developing  a  RAM  that 
would  meet  the  low  power  specification  for  the  filter  is 
critically  dependent  on  the  availability  of  a  suitable  CMOS 
technology. 

Another  scheme  for  lowering  the  power  consumption  of  the 
data  storage  section  further  is  to  partition  the  memory  into 
segments  and  activate  one  segment  at  a  time,  when  data  is  needed 
from  it.  This  scheme  is  illustrated  in  Figure  12.  Since 
dynamic  power  is  proportional  to  capacitance,  C,  and  assuming 
that  capacitance  is  proportional  to  area.  A,  then  decreasing  the 
area  of  the  memory  activated  at  one  time  by  4  would  result  in 
lowering  the  power  consumption  by  a  factor  of  4  as  well.  Each 
segment  of  the  partitioned  memory  still  operates  at  8MHz  when 
that  segment  is  activated,  but  as  far  as  power  consumption  is 
concerned,  the  effective  operational  rate  is  2MHz. 

4 .  Output  Drivers 

Output  drivers  can  consume  a  significant  amount  of  dynamic 
power  because  of  the  relatively  large  off-chip  capacitance  they 
have  to  drive.  In  this  section  we  will  consider  the  power 
dissipated  by  output  drivers  in  terms  of  current  Hughes  VBSIC 
technology.  This  will  determine  whether  it  is  possible  to 
divide  the  implementation  of  the  filter  into  two  or  more  chips 
and  yet  maintain  low  power  consumption. 

Figure  13  shows  a  typical  CMOS  output  driver  handling  an 
off-chip  capacitance  of  20pF.  The  drive  ratio  of  the  output 
devices  have  to  be  large  enough  to  ensure  that  rise  and  fall 
times  of  the  off-chip  signal  is  satisfactory.  Here,  they  are 
shown  with  100:1  ratios,  adequate  to  drive  20pF  loads  in  less 
than  30ns.  Another  beefed-up  inverter  is  used  to  drive  the 
considerable  capacitance  at  gates  Pi  and  N1  (calculated  to  be 
about  0.25pF).  The  drive  ratios  of  P2  and  N2  are  chosen  to  be 
10:1,  presenting  a  gate  capacitance  of  about  0.025pF  to  the 
previous  stage. 
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Figure  13.  Output  driver  configure 
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li.o.,  HMH?  for  a  nirnjlo  tap  impl  omon  t  i  r  m  I  T  tio  (><  »w*»r  ci  m  numod 
by  a  sinqlo  such  dr  Ivor  would  t  hon  lai 

P  -  J().2  7'ipP  .  (  )  h\J)^  -  HMUr  -  (*)0%  dufy  cyclo)  /  2 

-  0  .  *>  J  raw 

Two  1 V#  such  drlvors  oporatlnq  In  parallol  (actually  moro  ttian  1 
would  ho  noodod  r(j  handlo  c<immun  i  ca  t  i  ott  s  botwoon  data  ntoraqo 
and  pro’ossor  for  a  1„’-btt  l  mp  I  omont  at  ion>  would  cimniinio  moro 
ttian  hraW.  t:ompar  at  i  vo  1  y  ,  if  i  ho  filtor  wo  i  o  totally  inloqrafod 
on  (jno  chip,  the  output  driv/ors  would  oporato  at  HKHi,  and  ttio 

ptjwor  consuraod  by  oach  wr)uld  t)o  O.'iluW.  Also,  conatdorably 
fowor  (lutput  dr  Ivors  would  tto  n#odo<l  m  an  intoqratod  chip 
approach,  sinco  no  intor-ctiip  <.  ommun  i  (.'a  I  i>,)n  would  l>o  noi  onnary. 
From  this  analysis,  wo  stion'jl/  fav<)r  tho  ninqio  ctiip  approai'ti. 

S.  't'otjsl  Powor  Ho(|ii  i  r  omon  t  s_^  Dov  i  co  ('ount  and  S  i /:  r» 

In  ttiin  soiiini  wo  will  S'  imma  ri/e  tlio  ro-oilts  of  proiodiU') 
socfi'ins  and  <ihtain  ostlmatos  for  total  iKivroi  'onsimidi'm, 
do-.M  I'o  count  and  chip  n  i  r.o  for  t  ti«»  i  .  ,w  powor  filtor.  T  tio  coll 
typos  isod  in  ttio  filtor  .rro  Ilstod  in  Tabio  M,  toijotfior  wittt 
t  tio  t lowor  d  1  ss  1  fia  t  i  on  and  •  1o  v  i  co  •  -o.  j  nt  f  .  ir  r» a  ch  coll.  T a t'  1  o  I 
ttciwn  tfio  tiroatcdown  for  tho  major  c.  .mponouts  'if  ttio  rtl'or  in 

tor  ms  of  coll  t  '^rio  s  .  T  *10  j  n  >wor  con  s  u  mj  >  t  1  on  and  ■  lo  v  i  i  o  '  '  'u  n  t  .if 

o,sch  maior  componont  is  summarlrod  in  Tahir*  Total  p<>wor 

('onsiimptlon  f'Tr  tho  filtor  is  obtain  o<l  fry  add  1  rt<(  up  t  tio  powor 
c(  tri  s  u  mod  fjy  oa  c  ti  oiiipono  rd  ,  r  o  s  u  I  t  i  mi  in 

i'  ,  -  t  .  /  4  ( '■'  *  4  <>  t.  *  1  H  1  iW 

t  ' )  t  a  1 
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Table  11.  Power  Dissipation  and  Device  Count  for 
Cells  Used  in  Low  Power  Filter 


Cells 

Power  @  8  MHz 
(uW) 

Device  Count 

bhift  Register 

0.43 

8 

Full  Adder 

1 . 4 

26 

Select  Cell 

0.96 

18 

Booth  Decoder 

7.5 

98 

RAM 

0. 16 

6 

Table  12.  Power  Dissipation  and  Device  Count  for  Functional  Blocks  of  Filter 
SR-Shift  Reqistejc.  FA-Full  Adder,  SEL~Select,  BD-Booth  Decoder, 
RAM-Random  Access  Memory,  OD-Output  Driver 
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Similarly,  tr.e  total  devioe  cojnt  i9  jDtain**d  ov  iOd,'-,.;  j;.  -ne 

total  number  of  devices  in  each  component: 

-1 

0  »  120*"  '^IbSb  i4  0  devices, 

t  o  t  a  i 

This  device  count  does  not  include  circuitry  for  timing, 
control,  and  memory  address  generation.  To  allow  for  tnese 
extra  components,  an  extra  5  to  ’0%  must  be  added  to  the  tcc-al 
device  count  calculated  above.  The  estimate  for  total  power 
consumption  must  be  similarly  increased  to  account  for  the  extra 
circuitry. 

The  power  consumption  and  device  count  for  a  !0-bit  filter 
broken  down  by  ma^or  components  is  shown  in  Table  2.  One  should 
note  that  data  f'nd  coefficient  storage  make  up  about  90*  of  tne 
device  count  and  power  consumed.  Table  3  lists  the  total  power 
consumption  and  device  count  for  an  8,  1C  and  12  bit  filter 
using  the  equations  for  ^total  derived. 

To  estimate  the  size  of  the  filter  chip  we  have  to  estimate 
the  area  taken  up  by  RAM  and  the  area  taken  up  by  random  logic. 
To  estimate  the  device  density  for  random  logic,  we  note  that 
the  Hughes  VHSIC  correlator  chip  contains  72,000  devices  and 
measures  315  x  368  mtl^.  The  resulting  device  density  for  this 

chip  is  0.62  device/mil^.  We  will  assume  that  this  is  a 

representative  density  for  random  logic  fabricated  using  ’.2 
technology.  To  estimate  the  device  density  for  memory  arrays, 

we  note  that  the  Toshiba  256K  RAM  mentioned  in  Section  2.7.4 

contains  approximately 

256K  cells  .  4  devices/cell  >  10'-’  devices. 

on  a  chip  measuring  6.68  x  8.86  mm  (263  x  349  mil*^  ,  The 
device  density  for  this  chip  is  iO.9  dev  ices/'mi  1  * .  The  chip 
size  for  the  filter  can  be  estimated  by  adding  the  area  occupied 
by  RAM  and  the  area  occupied  by  the  processor; 
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S  -  920Gb/10.9  ^  (32b-^  >  65b  >  340)/0.62 
-  51  .  6b‘  •*■  949b  548  mil 

T"'?  oroiected  chip  size  for  an  8,  1 0  and  12-bit  filter  is  listed 

in  Table  3.  Approximately  half  the  chip  area  is  taken  ap  by 
data  and  coefficient  storaqe. 

6 .  Single  Tap  Versus  Multiple  Taps 

In  our  preliminary  study  v*e  chose  a  single  tap 
implementation  (Figure  8)  for  the  low  power  filter.  A  single 
tap  implementation  involves  buffering  the  incoming  data  samples 
in  memory  and  then  multiplexing  the  data  and  filter  coefficients 
at  tjqh  speed  into  the  mul t i pi let /at cumu iator  portion  of  t ne 
filter.  Whereas  the  data  samples  are  acquired  at  8KH2,  this 
scheme  requires  the  processor  section  to  be  run  at  8MHz,  A 
multi-tap  approach  could  be  run  at  lower  speeds,  but  would 
reqiire  more  gates  to  implement.  Since  dynamic  power 
dissioatlon  in  CMOS  technology  is  directly  proportional  to  total 
capacitance  and  speed  of  operation,  there  is  a  trade-off  between 
using  a  multi-tap  and  a  single  tap  approach. 

We  will  now  try  to  estimate  the  trade-off  between  using  a 
single  tap  and  a  muiti-tap  approach  in  terms  of  power 
cons  impt  ion  Figure  shows  a  four- tap  implementation  for  the 
low  power  filter.  Prom  this  figure  we  see  that  even  though  data 
and  coefficient  storage  become  partitioned  as  the  number  of 
taps,  N,  increases,  total  data  and  coefficient  storage  remains 
tne  same.  Power  consumed  by  the  RAM  can  be  estimated  in  the 
same  way  as  in  Section  2.C.3; 

P  -  •  (s^)  *  N  *  ’0  "W  -  0.243b/N  mW. 

Here  f  is  the  operational  speed  for  the  single  tap  approach , 

8MHz,  and  f/N  is  the  operational  speed  for  an  N-tap  filter.  The 
number  of  multipliers  and  adders,  would  increase  with  the  number 
of  taps,  so  the  total  capacitance  for  these  components  would  be 
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N(C  >  C  ),  where  C  and  C  are  the  total  capacitances 
tn  a  in  a 

associated  with  a  multiplier  and  an  adder,  respectively.  The 
power  dissipated  by  these  components  would  be 

P  -  N(C  C  )V^  (f/N)(50%)/2  -  25.6  x  10^  (C  C  )  . 
m  a  m  a 

Combining  this  with  power  dissipation  for  the  RAM  and  results 
from  sections  2.C.1  and  2.C.2,  total  power  for  an  N-tap  filter 
is  given  by 

P(b,N)  -  243b/N  +  1.73b2  ♦  2.43b  +  14  uW. 

Hence,  total  power  for  an  N-tap  filter  shows  a  decrease  only 
with  the  data  and  coefficient  storage  component.  This 
component,  however,  consumes  90%  of  filter  power,  and  any  power 
savings  here  %#ould  be  significant. 

Table  4  lists  the  power  consumption,  device  count  and  chip 
size  for  a  10-bit  filter  implemented  with  1  through  4  taps. 
Device  count  is  obtained  with  the  help  of  Table  U, 

D(b,N)  «  9220b  t  80  +  N(32b2  ♦  45b  +  260)  , 

and  chip  size  obtained  in  a  similar  manner  to  section  2.C.5, 

S(b,N)  -  9200b/10.9  +  I20b  +  80  N{32b^  t  45b  +260)]/0.62 

-  876b  ♦  129  t  N(51.6b-^  72.6b  +  419)  mil-^. 

The  percent  increase  or  decrease  in  power  consumption,  device 
count  and  chip  size  over  the  single  tap  approach  is  also  noted 
in  Table  4.  Going  from  a  single  tap  to  a  two-tap  filter  would 
decrease  power  consumption  by  46%,  but  would  also  increase  chip 

Size  by  41%. 

Chip  yield  must  be  considered  when  using  a  multi-tap 
approach.  The  low  power  *  liter  is  a  substantial  size  chip  even 
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when  a  single  tap  approach  is  used  (Section  2.3.5).  If  extra 
multipliers  and  adders  are  to  be  added  for  a  muiti-tap  approac.n, 
tnen  fabrication  yields  would  decrease  as  chip  size  increases. 

A  multi-chip  implementation  may  be  used,  but  this  approach 
requires  inter-chip  communication,  and  as  shown  in  Section 
2.C.4,  significant  power  is  consumed  by  output  drivers  nandiing 
off-chip  capacitances.  We  oelieve  tnat  this  power  will  not  oe 
compensated  for  by  operating  at  a  lower  speed. 

D.  OTHER  DESIGN  CONSIDERATIONS 

So  far  this  report  has  assumed  that  the  inputs  are  provided 
in  digital  form,  and  the  input  signal  does  not  contain  aliasing. 
This  assumes  that  the  original  analog  signal  has  been  "condi¬ 
tioned";  that  is,  it  has  been  sent  through  an  automatic  gain 
control  ( AGC )  circuit  and  pre-filtered  with  a  cut-off  at  about 
4Kaz  to  prevent  aliasing.  Furthermore,  the  conditioned  signal 
r.as  to  be  digitized.  Whether  the  analog-to-digital  (A/D)  con¬ 
verter  is  to  be  included  as  part  of  the  low  power  filter  chip 
has  to  be  considered.  We  have  not  examined  the  speed  and  power 
requirements  of  the  A/D  converter  in  this  report,  but  it  is  an 
important  part  of  the  filter  and  needs  further  study.  We 
believe,  however,  that  the  device  count  and  power  dissipation  in 
an  A/D  converter  would  be  small  compared  with  the  total  device 
count  and  power  consumption  for  the  filter. 

B.  COST  OF  FABRICATION 

A  preliminary  estimate  of  the  cost  of  fabricating  the  low 
power  filter  chip  using  Hughes  ’/HSIC  CMOS/SOS  technology  has 
been  made  by  our  Industrial  Electronics  Group  ( lEG)  at  Carlsbad. 
This  cost  estimate  is  presented  in  Table  13,  and  includes 
processing  2  lots  of  wafers  —  almost  a  necessity  for  a  chip  of 
tne  filter's  complexity.  Errors  in  the  first  design  will  Oe 
eli.minated  in  the  second  lot.  A  tentative  schedule  for  fabrica¬ 
tion  of  the  filter  chip  over  a  24  month  period  is  shown  in 
Figure  15. 
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Table  13.  Cost  of  Fabrication  for  Filter  Using  the 
Hughes  VHSIC  CMOS/SOS  Power  Process 


Design 
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Design  and  Layout 

300K 

Simulation 

35K 

Generate  CALMA  tape,  PG  tape,  mask  set 

37K 

372K 

Test  (test  hardware,  test  program, 
generate  test  vectors) 

65K 

Reiteration  (redesign  and  new  masks) 

40K 

Processing  (2  lots,  parametric  measure 
and  probe) 

7  OK 

Assembly  and  Test 

2  5K 

Project  Engineer 

50K 
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40K 
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Tentative  schedule  for  processinq  low  power  filter  chip. 
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APPENDIX  A 


TOSHIBA  256  K  CMOS  STATIC  RAM 

ISSCC  84 /THURSDAY,  FEBRUARY  23,  1984 /  CONTINENTAL  BALLROOMS  5-9  / THPM  15.1 


/xe  J40  for  r  yfure  I .  J 


SESSION  XV  STATIC  RAMs 


Ch«ifrn*n  ®  .•'S'C 


THPM  151  A  46ns  256K  CMOS  RAM  c  L.a 

V/  rsiyO  iio£>e  Juoicn,  Marsuntg^  y’sksyasu  S^Murgi  Onnni 

K4/ur>  ro  S4«v«<7«.  H.rotr\i  A/o/jkva  Tettuvk  ‘-luks  Susumu  Kor>^»m» 

Stm^Of^riucTar  Device  Brtgin^^nng  LsOorttory 
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one  of  16  sections  The  section  word  Ijie  is  activated  bv  the 
main  word  lu'e  and  a  column  seie  ^inai  bince  oniv  32 
memorv  reus  connected  to  one  section  word  line  are  accessed 
m  a  cvrle  ''olumn  current  flows  only  m  a  selected  section  In 
addition,  an  RC  lime  delav  of  each  action  word  line  is  reduced 
to  1  256  compared  with  conventional  arrangemcnls  Therefore 
the  total  word  line  delav  i«  reduerd  to  8  ons  from  30ns  as  is 
the  case  of  conventional  4  block  word  line  configurations 
The  circuit  design  i*  realized  bv  utilizing  double  aiuminum 
structure 

A  '..h*matic  diagram  of  a  memorv  cell  and  peripheral  '•if'^jit* 
IS  iJiustfalcd  in  Figure  4  A  two-stage  curent  mirror  f>pe  OlUb 
iease  amplifier  is  u.sed  to  achieve  high  -Deed  read  c,peraf!on 
The  first  stage  amplifies  a  small  signal  irorn  cme  01  the  tour  bit 
line  paurs  The  second  stage  amplifies  the  first  stage  output 
signal  to  a  large  swing  ;evei 

The  bit  lines  and  the  fust  ve'^«e  amplifier  output  are 
equalized  bv  the  chip  activating  pulve  before  ’he  -“ead  f>peraiiun 
To  improve  fabrication  viejd  a  redundancs  rircuit  i? 
emploved  without  anv  ^peed  degrada’  jn 

The  oseilJugraph  of  the  address  ;npuf  and  data  output  ugna: 
waveforms  at  \  dd  5V  v^tth  oad  *aPa'‘itance  :..ssh<;wn 

in  Figure  $  which  LridKate>.  a  AOr.v  aOdr^'v  ac'evi  rime 
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PARAAICTERS 

&4K  CIM*0SRAM 

2^6K-CMOS  Ram 

PBorrcc  DOUBLE  LEYEL  PaY-5i 
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0<X«L£  level  Ai 
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0  5  Mf" 

0 
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■  ilA.  ,  WiCTm/SPACINOI 

2  *Am/  2  pA.m 

1  2  ^  '  6 

If  contact  mole 

2  n*m*2 

1  2  *  '  2 

2ao  At  iWlOTH  SPACING) 

— 

2  0>tm/2  OfAtn 

2fid  CONTACT  hole 

— 

2  Om'Y' “2  OpA^Ti 

TABl E  2 


iSrNCMRONOUS 


nPEWATiCN 


■ADOPESS  ACTivATEO 
CLOCKED  OPERATION) 


Au'O  POWER  DOWN  t,jNC 


r 


organization  J2K  WORDS  <88iT 

REDUNDANCY  4  SPARE  ROWS 

CHiP  Size  6  68  »  0  06  iniTi 

CEll  SiZE  1 '■  n  'J  StiiTi 

:/G  ;n'^erface  ccwpat'Ble 

ADDRESS  ACCESS  TiME  46  n, 

ACTIVE  POWER  IQ  mW  I  1  MHz  ) 

STANDBY  POWER  30  ^  W 

PACKAGE  STANDARD  28P’N  DIP 

TABLE  1— TypicAJ  cha/Ktmsbcj  of  the  256Kb  CMOS  RAM. 
I  hr  ft:  Topi 

TABLE  2  -  Detifo  rule*  and  device  parameteri  of  the  256Kb 
CMOS  RAM 


I  Left  I 

E'lCfjRE  3— Suppiy  current  vefuia  opefahn^  freepirnciea. 

a 

IBeiowj 

FICLRE  4— Sdreinatic  of  memory  ceU  and  penphenU 
cirauita. 
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I.>ee  pa^e  J43  for  f  tgwe  3.} 


SESSION  XV:  STATIC  RAMs 

THPM  IS  5:  A  20ns  &4K  CMOS  SRAM 

Oisrrtu  V'oafo  Tosh-o  Sssa*  /oja.o  TetSL.ya 

L  ra 

"'jiTvO  Japan 


I*''  RKf  4fvprii  ruruit  iffhniqufshav^  brrn 

1  umbinrd  -.irh  sraiirj^  '.o  r^aiizp  ^10S  sfafir  K  ANIs.  hjviryj  i 
«'omp4/aDlf  ’<i  rjipoiar  K  AMa  ‘  *  It  h*a  aLs.<i  b^«*a  pov 
«rDi^  to  rrmir  bipolar  K  A^la  wnh  'Irnsitiea  '■omparabi^ 
to  ^U.»S  'i^vir^*'  rhi<  pap^r  vxilJ  r»p<3rt  on  a  b4K  x  Ib  tAlUS 
-til;/  Ra'*)  -  -(h  aJ'M''tNpH  aiaddr<s«  arcraa  tim^*  and  "Om 

p-^w^r  d^V'ipatujn. 

Th^  R  AM  pertormanrf  has  brrn  arhirvrd  bv  ihr  drvi»lup 
m^nr  a  » ord' Ijnf  irchniqu**  and  double*  P 

^lP<^a^  ;  .'•i'.”'  <  3*  .M<  t''  i  r  J’'‘uitrv  Also,  jutr  If  n^th  >y(  MUS 

^ran*'-'-'  ■'a.'  doY*n  to  1  1am 

Thr  puLSfd  '-ord-lin«  trchoiquf  is  dlustrat^d  in  ^  i^vire  1 

ii  rji**  par»  of  »hf  R  AM  r<)nirodfd  bv  «-|oi  k<  »s 
Vio-  r  r.  p  fuf  J,  Th**  K  A^l  ‘-ircuitr^  la  a f  1 1 » a  1  <*d  bv  thf 
inf^mai  - jrj- k  t'opmM  bv  d'^f^rtina  aJl  addrf«x  and<'.> 

•'ati'  '  ‘f''  ’  ^ '  ba.vK  '  lor K  tor  *hr  R  \ \1  i  irru itrv  opf ration 

•>  \  ft  wnirn  '/mCTois  v*ord  linf«  (hroutfb  thf  X  dfrodfr  and 
v»nsf  a/ripi:l*^'v  t  jr  pf<*<-hargf  and  '•qudibration  ot  data  linr*. 
pppmarg#  -lo-  n  P<'  hv  VD  and  '•quivalrnt  to  \D. 

'  ;v*d  'hr  Paair  rlf>rK  \[j  b^Mimr^  hifth,  a  Vfifrlfd  word 

mr  bf'OfTSf*  higr  orrauv  fhr  X  dfrodfr  is  arfivat'"d  At  thr 
'aTir  'ifTir  ’h"  ampliJirra  arr  ariivatrd  hv  \D  At  thi^ 
m'lfnrni  <u-,r^  tjir  rjork  .'high,  thf  pr^rhargr  fransLstor 
^ork'.  a*  '.hr  ,»oa/  data  iinr  toad  ^ionvqijrnflv  <maH  dilfrr 
ta'a  "rf-  rfird  -  rl!  appear  onihr  data  Thr«>r 

lata  o^  fh-  '■rj  arr  'raristrjrrd  to  thr  amplifier  which 

-rin^.vii  ,i  'w<-  viag?<  ijf  two  singjc-^ndcd  artivr  load  differential 
ampniier^  thrn  i<j  the  output  buffer  This  operation  is 
.  all  '»(>eration«  begin  after  the  data  Iine« 

a/r  rquinbrafrq  Then  prerharge  rlock  P(.  goes  lov*.  cutting  ail 
data  me  lotd^  From  this  transition  no  dr  ojrrent  is  ron^med 
bv  'Tr  ri)  A)»er  thr  signal  I'  fTansferfed  to  the  output  fniffer. 
'he  la'a  :%  ai'^nerj  lo  the  output  buffer  bv  the  DL  cfoik 
»  I'thrr  ft>i»  M.;rk  puIN  down  all  word  lines  to  the  low  stale 
L  ’  r  '■n-jmenf  ^aia  ones  are  i/nmediaieiv  precharged  bv  the  Pf^ 

•  <  fc  •  D'eyjrr  <.-.r  fivr rji i p n t  data  read  This  rirrijii  ♦e*-ti 


'*'~jnner  KJ  n  aJ,  "A^n«4K«l  N  MOS  Sla  ijr  RAM" 
Dl'.FST  or  TF<.  MN!(  At.  PApfHS  p  l0♦-10^  Frh  . 

;  e  m 

'Mvnitr.  O  Ml  r-MOSlI  4K  Stit,/-  RAM  '  ISSCr  DtOFST 
F  ’  t  '  ;j*,v  y  14  1  *•  Fm  I'-igl 

’  4.  ,  K  et  »l  A  2  h  n ,  «  K  i  8 1>  Sut  \r  M  T !  \  ^ ! 

R  A  v<  '  ssr  r  oif.F  ST  nf  T  F  r  M  Vjr  A  I.  P  aPF  R  S  p  I  Mk  11  1 
•  ^8  T 

Tarirr  l'  K  n  aI  a  64K  i  Jr-  >lal)f  RAM  ' 

DIOFST  OF  TFM-tN'MAl  p  a  P  F  H  <i  p  888* 

8  A 

'  H  >  •*  A.  A  8  4K  i>  f  M  MS  R  A  M  I  S  St  f  DI'^FST 

IF  "F^MSMA^  PAPPMS  V  Frr' 


niqije  differs  from  existing  data  Lne  equuibralion  techniques* 
and  latrhed  column  ternnique*^  in  that  the  word  line  is  kept 
high  when  a  specific  addrrvs  is  read  The  former  suffers  from 
large  current  through  memorv  cells,  although  last  access  is 
achieved  The  latter  achieves  low  povver  dv  latcning  tne  signal 
at  the  column  bv  pulling  one  of  the  bit  lines  to  a  completely 
lov*  state  resulting  m  slow  arc‘‘v  time  and  large  bit  line  recoverv 
lime  PuLsed-word-line  iPWL)  techniques  make  it  possible  to 
reduce  current  through  the  transmission  gate  of  the  ceil  bv  the 
Pt'  cjork  and  to  obtain  fast  access  time  bv  static  operation 
dunng  which  both  \D  and  PC  are  high.  Also  the  P^L  lerbmque 
reduces  the  data  line  reroverv  time 

The  \D.  PC  clock  generators  and  output  buffer  use  a  bipolar 
(  BCMUS)  conf iguralion  to  assure  fast  risetime  The  bi¬ 
polar  transistor  i*  formed  m  a  thm  P  well  to  reiUie  High  t'y 
Thus,  this  terhnulogv  utilizes  double  P  webs  one  ’’or  NMU.S 
transistors  and  the  other  for  high  performance  bipolar  transis¬ 
tors  Kisetime  capabihrv  of  the  bipolar  device  is 
•i  01  .lns,\  •  pF  This  IS  three  times  greater  than  that  of  bipolar 
desires  formed  in  the  usual  P  well 

Third  generation  f'Mf)S  <  Hi  flMtjfslIl  i  technoiogv  has  been 
devrjoprd  1  sed  are  N.  and  P  '■liannei  M'l-*,  Tanststorshas  mg 
1  Turn  tv  picai  ga  te  length  and  i  lum  dr«ig->  -j  te  Basicallv  . 
this  irchnoiogv  i»  a  "O  pment  '•eriu'  ti-.n  m  -dte  b«'th  hor\7«>n 
•alK  and  vrrticallv  of  the  original  Hi  CAT .''II  ’erhroMogs  s*hirh 
utilizes  a  2um  design  rule 

The  memorv  reil  js  a  •  |■(>sA.^•l>upied  Csvjr  NAfi  in  flij.iflop 
with  high  rrsutanre  loads  The  crlj  u  )J8um*  dwm  s  1  f>um 
A  ph»»t.im'«r')g7aph  of  the  i  hip  is  r  f  g'ire  \  Thr 

die  measures  1  Ihmm  x  h  Umm  To  achieve  'asi  access  time  the 
K  AM  Ik  cjfgjnized  w  that  the  arrav  ia  spjjl  mto  four  planes  of 
h4  Columns  %  2^»h  rows  t  orrespimdirig  to  these  arrav  s,  luur 
s^nse  amplifiers  laid  out  and  power  switfhed  attording  lo 
address  A  1  4  and  A  i  "• 

A  t\pifal  JMfis  address  arrets  time  was  achieved  ^.ith  a 

nominal  power  dissipation  at  I  AIM  ?  r  \  r  ie  rime  '}opvlerf 
acrev  time  ls  22nR  The  K  AM  output  wiveif»rms  tor  a  ivpiral 
fOpf  l«*ad  ca pa*  ifanre  ar  ^how  n  m  f  -gijre  A  'upptv  MirTeni 
ver^soperalingfrequeruies  are  sfio  wn  m  T  'gu  re  A  « t  ive 
power  dissipation  at  low  operating  fretpimne*  is  radioed  hv  fhe 
aid  «>f  the  pulsed  word  line  i  pU  |  »  lef  hnniues  I  \  fn  al  feaiiires 
of  the  R  am  are  summarued  m  Table  1  Th's  R  IM  hasrealued 
a  ‘•peerf  'ornparahle  to  h!l.'olar  ink,  HI  r,4k  U 

ih«»ugh  it  »ori.surr>es  rntirh  lev  power 

4r  Anou  fe  me  nl  I 

Iheau*horswts>»n*if>ankM  Kulio  ^  A*a'  '  \onesama 
A  K»‘S4  arid  I  A  A-ao  f***  'heir  guefaroe  >*'0  N  Ha«2iifvioOi 
A  *^agai  '  Aamam"t<.i,,r»he  devKe‘af,ri.ai..ui 
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FIGURE  4— RA.>1  output  wav^forma 
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